Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis
نویسندگان
چکیده
We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car fruit), symmetrically related (car automobile), asymmetrically related with the first term being more specific than the second (banana fruit), and asymmetrically related in the other direction (fruit banana).We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results.
منابع مشابه
شناسایی خودکار سبک موسیقی
Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...
متن کاملIncreasing the Coverage of Medicinal Chemistry-Relevant Space in Commercial Fragments Screening
Analyzing the chemical space coverage in commercial fragment screening collections revealed the overlap between bioactive medicinal chemistry substructures and rule-of-three compliant fragments is only ∼25%. We recommend including these fragments in fragment screening libraries to maximize confidence in discovering hit matter within known bioactive chemical space, while incorporation of nonover...
متن کاملLong-term Iran's inflation analysis using varying coefficient model
Varying coefficient Models are among the most important tools for discovering the dynamic patterns when a fixed pattern does not fit adequately well on the data, due to existing diverse temporal or local patterns. These models are natural extensions of classical parametric models that have achieved great popularity in data analysis with good interpretability.The high flexibility and interpretab...
متن کاملThe chemotaxonomic classification of Rhodiola plants and its correlation with morphological characteristics and genetic taxonomy
BACKGROUND Rhodiola plants are used as a natural remedy in the western world and as a traditional herbal medicine in China, and are valued for their ability to enhance human resistance to stress or fatigue and to promote longevity. Due to the morphological similarities among different species, the identification of the genus remains somewhat controversial, which may affect their safety and effe...
متن کاملStudy on Application of Two Different Magnetic Materials in Rotor of Cylindrical Synchronous Generator to Produce Reluctance Torque
Synchronous generators are of two type’s salient pole type and round rotor type. The load angle curve of a cylindrical rotor synchronous machine comprises a single sine term only while in salient pole synchronous generators, power-angle characteristic has two terms. The first term is the fundamental component due to field excitation (the same as the cylindrical rotor) and the second term ...
متن کامل